Thermal tolerant avicelase from acidothermus cellulolyticus

ABSTRACT

The invention provides a thermal tolerant (thermostable) cellulase that is a member of the glycoside hydrolase family. The invention further discloses this cellulase as AviIII. AviIII has been isolated and characterized from  Acidothermus cellulolyticus.  The invention further provides recombinant forms of the identified AviIII. Methods of making and using AviIII polypeptides, including fusions, variants, and derivatives, are also disclosed.

GOVERNMENT INTERESTS

[0001] The United States Government has rights in this invention underContract No. DE-AC36-99GO10337 between the United States Department ofEnergy and the National Renewable Energy Laboratory, a Division of theMidwest Research Institute.

FIELD OF THE INVENTION

[0002] The invention generally relates to a novel avicelase fromAcidothermus cellulolyticus, AviIII. More specifically, the inventionrelates to purified and isolated AviIII polypeptides, nucleic acidmolecules encoding the polypeptides, and processes for production anduse of AviIII, as well as variants and derivatives thereof.

BACKGROUND OF THE INVENTION

[0003] Plant biomass as a source of energy production can includeagricultural and forestry products, associated by-products and waste,municipal solid waste, and industrial waste. In addition, over 50million acres in the United States are currently available for biomassproduction, and there are a number of terrestrial and aquatic cropsgrown solely as a source for biomass (A Wiselogel, et al. Biomassfeedstocks resources and composition. In C E Wyman, ed. Handbook onBioethanol: Production and Utilization. Washington, D.C.: Taylor &Francis, 1996, pp 105-118). Biofuels produced from biomass includeethanol, methanol, biodiesel, and additives for reformulated gasoline.Biofuels are desirable because they add little, if any, net carbondioxide to the atmosphere and because they greatly reduce ozoneformation and carbon monoxide emissions as compared to the environmentaloutput of conventional fuels. (P Bergeron. Environmental impacts ofbioethanol. In C E Wyman, ed. Handbook on Bioethanol: Production andUtilization. Washington, D.C.: Taylor & Francis, 1996, pp 90-103).

[0004] Plant biomass is the most abundant source of carbohydrate in theworld due to the lignocellulosic materials composing the cell walls ofall higher plants. Plant cell walls are divided into two sections, theprimary and the secondary cell walls. The primary cell wall, whichprovides structure for expanding cells (and hence changes as the cellgrows), is composed of three major polysaccharides and one group ofglycoproteins. The predominant polysaccharide, and most abundant sourceof carbohydrates, is cellulose, while hemicellulose and pectin are alsofound in abundance. Cellulose is a linear beta-(1,4)-D-glucan andcomprises 20% to 30% of the primary cell wall by weight. The secondarycell wall, which is produced after the cell has completed growing, alsocontains polysaccharides and is strengthened through polymeric lignincovalently cross-linked to hemicellulose.

[0005] Carbohydrates, and cellulose in particular can be converted tosugars by well-known methods including acid and enzymatic hydrolysis.Enzymatic hydrolysis of cellulose requires the processing of biomass toreduce size and facilitate subsequent handling. Mild acid treatment isthen used to hydrolyze part or all of the hemicellulose content of thefeedstock. Finally, cellulose is converted to ethanol through theconcerted action of cellulases and saccharolytic fermentation(simultaneous saccharification fermentation (SSF)). The SSF process,using the yeast Saccharomyces cerevisiae for example, is oftenincomplete, as it does not utilize the entire sugar content of the plantbiomass, namely the hemicellulose fraction.

[0006] The cost of producing ethanol from biomass can be divided intothree areas of expenditure: pretreatment costs, fermentation costs, andother costs. Pretreatment costs include biomass milling, pretreatmentreagents, equipment maintenance, power and water, and wasteneutralization and disposal. The fermentation costs can include enzymes,nutrient supplements, yeast, maintenance and scale-up, and wastedisposal. Other costs include biomass purchase, transportation andstorage, plant labor, plant utilities, ethanol distillation, andadministration (which may include technology-use licenses). One of themajor expenses incurred in SSF is the cost of the enzymes, as about onekilogram of cellulase is required to fully digest 50 kilograms ofcellulose. Economical production of cellulase is also compounded byfactors such as the relatively slow gowth rates of cellulase-producingorganisms, levels of cellulase expression, and the tendency ofenzyme-dependent processes to partially or completely inactivate enzymesdue to conditions such as elevated temperature, acidity, proteolyticdegradation, and solvent degradation.

[0007] Enzymatic degradation of cellulose requires the coordinate actionof at least three different types of cellulases. Such enzymes are givenan Enzyme Commission (EC) designation according to the NomenclatureCommittee of the International Union of Biochemistry and MolecularBiology (Eur. J. Biochem. 264: 607-609 and 610-650, 1999).Endo-beta-(1,4)-glucanases (EC 3.2.1.4) cleave the cellulose strandrandomly along its length, thus generating new chain ends.Exo-beta-(1,4)-glucanases (EC 3.2.1.91) are processive enzymes andcleave cellobiosyl units (beta-(1,4)-glucose dimers) from free ends ofcellulose strands. Lastly, beta-D-glucosidases (cellobiases: EC3.2.1.21) hydrolyze cellobiose to glucose. All three of these generalactivities are required for efficient and complete hydrolysis of apolymer such as cellulose to a subunit, such as the simple sugar,glucose.

[0008] Highly thermostable enzymes have been isolated from thecellulolytic thermophile Acidothermus cellulolyticus gen. nov., sp.nov., a bacterium originally isolated from decaying wood in an acidic,thermal pool at Yellowstone National Park. A. Mohagheghi et al., (1986)Int. J. Systematic Bacteriology, 36(3): 435-443. One cellulase enzymeproduced by this organism, the endoglucanase EI, is known to displaymaximal activity at 75° C. to 83° C. M. P. Tucker et al. (1989),Bio/Technology, 7(8): 817-820. E1 endoglucanase has been described inU.S. Pat. No. 5,275,944. The A. cellulolyticus E1 endoglucanase is anactive cellulase; in combination with the exocellulase CBH I fromTrichoderma reesei, E1 gives a high level of saccharification andcontributes to a degree of synergism. Baker J O et al. (1994), Appl.Biochem. Biotechnol., 45/46: 245-256. The gene coding EI catalytic andcarbohydrate binding domains and linker peptide were described in U.S.Pat. No. 5,536,655. E1 has also been expressed as a stable, activeenzyme from a wide variety of hosts, including E. coli, Streptomyceslividans, Pichia pastoris, cotton, tobacco, and Arabidopsis (Dai Z,Hooker B S, Anderson D B, Thomas S R. Transgenic Res. 2000 February;9(1):43-54).

[0009] The potential exists for the successful, commercial-scaleexpression of heterologous cellulases, and in particular novelcellulases with or without any one or more desirable properties such asthermal tolerance and resistance to acid inactivation, proteolyticinactivation, and solvent inactivation. Such expression can occur infilamentous fungi, bacteria, and other hosts.

[0010] There is a need within the art to generate alternative cellulaseenzymes capable of assisting in the commercial-scale processing ofcellulose to sugar for use in biofuel production. Against this backdropthe present invention has been developed. The potential exists for thesuccessful, commercial-scale expression of heterologous cellulasepolypeptides, and in particular novel cellulase polypeptides with orwithout any one or more desirable properties such as thermal tolerance,and partial or complete resistance to extreme pH inactivation,proteolytic inactivation, solvent inactivation, chaotropic agentinactivation, oxidizing agent inactivation, and detergent inactivation.Such expression can occur in fungi, bacteria, and other hosts.

SUMMARY OF THE INVENTION

[0011] The present invention provides AviIII, a novel member of theglycoside hydrolase (GH) family of enzymes, and in particular a thermaltolerant glycoside hydrolase useful in the degradation of cellulose.AviIII polypeptides of the invention include those having an amino acidsequence shown in SEQ ID NO: 1, as well as polypeptides havingsubstantial amino acid sequence identity to the amino acid sequence ofSEQ ID NO: 1 and useful fragments thereof, including, a catalytic domainhaving significant sequence similarity to the GH74 family, a firstcarbohydrate binding domain (type II) and a second carbohydrate bindingdomain (type III). See FIG. 1.

[0012] The invention also provides a polynucleotide molecule encodingAviIII polypeptides and fragments of AviIII polypeptides, for examplecatalytic and carbohydrate binding domains. Polynucleotide molecules ofthe invention include those molecules having a nucleic acid sequence asshown in SEQ ID NO:2; those that hybridize to the nucleic acid sequenceof SEQ ID NO:2 under high stringency conditions; and those havingsubstantial nucleic acid identity with the nucleic acid sequence of SEQID NO:2.

[0013] The invention includes variants and derivatives of the AviIIIpolypeptides, including fusion proteins. For example, fusion proteins ofthe invention include AviIII polypeptide fused to a heterologous proteinor peptide that confers a desired function. The heterologous protein orpeptide can facilitate purification, oligomerization, stabilization, orsecretion of the AviIII polypeptide, for example. As further examples,the heterologous polypeptide can provide enhanced activity, includingcatalytic or binding activity, for AviIII polypeptides, where theenhancement is either additive or synergistic. A fusion protein of anembodiment of the invention can be produced, for example, from anexpression construct containing a polynucleotide molecule encodingAviIII polypeptide in frame with a polynucleotide molecule for theheterologous protein. Embodiments of the invention also comprisevectors, plasmids, expression systems, host cells, and the like,containing a AviIII polynucleotide molecule. Genetic engineering methodsfor the production of AviIII polypeptides of embodiments of theinvention include expression of a polynucleotide molecule in cell freeexpression systems and in cellular hosts, according to known methods.

[0014] The invention further includes compositions containing asubstantially purified AviIII polypeptide of the invention and acarrier. Such compositions are administered to a biomass containingcellulose for the reduction or degradation of the cellulose.

[0015] The invention also provides reagents, compositions, and methodsthat are useful for analysis of AviIII activity.

[0016] These and various other features as well as advantages whichcharacterize the present invention will be apparent from a reading ofthe following detailed description and a review of the associateddrawings.

[0017] The following Tables 4 and 5 includes sequences used indescribing embodiments of the present invention. In Table 4, theabbreviations are as follows: CD, catalytic domain; CBD_II, carbohydratebinding domain type II; CBD_I, carbohydrate binding domain type III; andFN-III, fibronectin domain type III. When used herein, N* indicates astring of unknown nucleic acid units, and X* indicates a string ofunknown amino acid units, for example about 50 or more. Table 4 includesapproximate start and stop information for segments, and Table 5includes amino acid sequence data for segments. TABLE 4 Nucleotide andpolypeptide segments. aa aa base base Length, BEGIN END Length, SEQ IDNo. SEQ ID No. Avi III Segment BEGIN END bp No. aa No. aa aa (aminoacid) (nucleotide) Total length 1 about about 1 M about X* about 1 23000 3 kb 1000 1 kb Signal (potential) 1  108  108 1 M  36 A  36 CD(GH74) 109 2328 2220 37 A 776 G 740 3 CBD_III (partial) 2575 about about859 V about X* about 4 3000 0.5 kb 1000 154 CBD_III (partial) 2575 2838 264 859 V 946 Q  88 5

[0018] TABLE 5 Gene/polypeptide segments with amino acid sequences. SEQID No. (amino SEQ ID No. AviIII acid) (nucleodide) Segment segment data1 2 Total SEQ ID NO: 1 (see TABLE 1; SEQ ID NO: 2 (see TABLE 2) lengthSignal M RSRRLVSLLAATASFAVAAALGVL PI AITASPAH A (poten- tial) 3 CD(GH74) ATTQPYTWSNVAIGGGGFVDGIVFNEGAPGILYVRTDIGGMYRWDAANGRWIPLLDWVGWNNWGYNGVVSIAADPINTNKVWAAVGMYTNSWDPNDGAILRSSDQGATWQITPLPFKLGGNMPGRGMGERLAVPNNDILYFGAPSGKGLWRSTDSGATWSQMTNFPDVGTYIANPTDTTGYQSDIQGVVWVAFDKSSSSLGQASKTIFVGVADPNNPVFWSRDGGATWQAVGAPTGFIPHKGVFDPVNHVLYIATSNTGGPYDGSSGDVWKFSVTSGTWTRISPVSTDTANDYFGYSGLTIDRQHPNTIMVATQISWWPDTIIFRSTDGGATWTRIWDWTSYPNRSLRYVLDISAEPWLTFGVQPNPPVPSPKLGWMDEAMAIDPFNSDRMLYGTGATLYATNDLTKWDSGGQIHIAPMVKGLEETAVNDLISPPSGAPLISALGDLGGTHADVTAVPSTIFTSPVFTTGTSVDYAELNPSIIVRAGSFDPSSQPNDRHVAFSTDGGKNWFQGSEPGGVTTGGTVAASADGSRFVWAPGDPGQPVVYAVGFGNSWAASQGVPANAQIRSDRVNPKTFYALSNGTFYRSTDGGVTFQPVAAGLPSSGAVGVMFHAVPGKEGDLWLAASSGLYHSTNGGSSWSAITGVSSAVNVGFGKSAPGSSYPAVFVVGTIGGVTGAYRSDDCGTTWVLINDDQHQYGNWGQAITGDHANLRRVYIGTNGRGI V YGIGGAPS G 4 CBD_IIIV SGGVKVQYKNNDSAPGDNQIKPGLQVVNTGSSSVDLSTVTVRYWFTRDGGSSTLVYNCDWAAIGCGN(partial) IRASFGSVNPATPTADTYLQX* 5 CBD_III VSGGVKVQYKNNDSAPGDNQIKPGLQVVNTGSSSVDLSTVTVRYWFTRDGGSSTLVYNCDWAAIGCGN(partial) IRASFGSVNPATPTADTYLQ

BRIEF DESCRIPTION OF THE DRAWINGS

[0019]FIG. 1 is a schematic representation of the gene sequence andamino acid segment organization.

[0020]FIG. 2 is a graphic representation of the glycoside hydrolasegene/protein families found in various organisms.

DETAILED DESCRIPTION

[0021] Definitions:

[0022] The following definitions are provided to facilitateunderstanding of certain terms used frequently herein and are not meantto limit the scope of the present disclosure:

[0023] “Amino acid” refers to any of the twenty naturally occuring aminoacids as well as any modified amino acid sequences. Modifications mayinclude natural processes such as posttranslational processing, or mayinclude chemical modifications which are known in the art. Modificationsinclude but are not limited to: phosphorylation, ubiquitination,acetylation, amidation, glycosylation, covalent attachment of flavin,ADP-ribosylation, cross linking, iodination, methylation, and alike.

[0024] “Antibody” refers to a Y-shaped molecule having a pair of antigenbinding sites, a hinge region and a constant region. Fragments ofantibodies, for example an antigen binding fragment (Fab), chimericantibodies, antibodies having a human constant region coupled to amurine antigen binding region, and fragments thereof, as well as otherwell known recombinant antibodies are included in the present invention.

[0025] “Antisense” refers to polynucleotide sequences that arecomplementary to target “sense” polynucleotide sequence.

[0026] “Binding activity” refers to any activity that can be assayed bycharacterizing the ability of a polypeptide to bind to a substrate. Thesubstrate can be a polymer such as cellulose or can be a complexmolecule or aggregate of molecules where the entire moiety comprises atleast some cellulose.

[0027] “Cellulase activity” refers to any activity that can be assayedby characterizing the enzymatic activity of a cellulase. For example,cellulase activity can be assayed by determining how much reducing sugaris produced during a fixed amount of time for a set amount of enzyme(see Irwin et al., (1998) J. Bacteriology, 1709-1714). Other assays arewell known in the art and can be substituted.

[0028] “Complementary” or “complementarity” refers to the ability of apolynucleotide in a polynucleotide molecule to form a base pair withanother polynucleotide in a second polynucleotide molecule. For example,the sequence A-G-T is complementary to the sequence T-C-A.Complementarity may be partial, in which only some of thepolynucleotides match according to base pairing, or complete, where allthe polynucleotides match according to base pairing.

[0029] “Expression” refers to transcription and translation occurringwithin a host cell. The level of expression of a DNA molecule in a hostcell may be determined on the basis of either the amount ofcorresponding mRNA that is present within the cell or the amount of DNAmolecule encoded protein produced by the host cell (Sambrook et al.,1989, Molecular cloning: A Laboratory Manual, 18.1-18.88).

[0030] “Fusion protein” refers to a first protein having attached asecond, heterologous protein. Preferably, the heterologous protein isfused via recombinant DNA techniques, such that the first and secondproteins are expressed in frame. The heterologous protein can confer adesired characteristic to the fusion protein, for example, a detectionsignal, enhanced stability or stabilization of the protein, facilitatedoligomerization of the protein, or facilitated purification of thefusion protein. Examples of heterologous proteins useful in the fusionproteins of the invention include molecules having one or more catalyticdomains of AviIII, one or more binding domains of AviIII, one or morecatalytic domains of a glycoside hydrolase other than AviIII, one ormore binding domains of a glycoside hydrolase other than AviIII, or anycombination thereof. Further examples include immunoglobulin moleculesand portions thereof, peptide tags such as histidine tag (6-His),leucine zipper, substrate targeting moieties, signal peptides, and thelike. Fusion proteins are also meant to encompass variants andderivatives of AviIII polypeptides that are generated by conventionalsite-directed mutagenesis and more modern techniques such as directedevolution, discussed infra.

[0031] “Genetically engineered” refers to any recombinant DNA or RNAmethod used to create a prokaryotic or eukaryotic host cell thatexpresses a protein at elevated levels, at lowered levels, or in amutated form. In other words, the host cell has been transfected,transformed, or transduced with a recombinant polynucleotide molecule,and thereby been altered so as to cause the cell to alter expression ofthe desired protein. Methods and vectors for genetically engineeringhost cells are well known in the art; for example various techniques areillustrated in Current Protocols in Molecular Biology, Ausubel et al.,eds. (Wiley & Sons, New York, 1988, and quarterly updates). Geneticallyengineering techniques include but are not limited to expressionvectors, targeted homologous recombination and gene activation (see, forexample, U.S. Pat. No. 5,272,071 to Chappel) and trans activation byengineered transcription factors (see, for example, Segal et al., 1999,Proc Natl Acad Sci USA 96(6):2758-63).

[0032] “Glycoside hydrolase family” refers to a family of enzymes whichhydrolyze the glycosidic bond between two or more carbohydrates orbetween a carbohydrate and a non-carbohydrate moiety (Henrissat B.,(1991) Biochem. J., 280:309-316). Identification of a putative glycosidehydrolase family member is made based on an amino acid sequencecomparison and the finding of significant sequence similarity within theputative member's catalytic domain, as compared to the catalytic domainsof known family members.

[0033] “Homology” refers to a degree of complementarity betweenpolynucleotides, having significant effect on the efficiency andstrength of hybridization between polynucleotide molecules. The termalso can refer to a degree of similarity between polypeptides.

[0034] “Host cell” or “host cells” refers to cells expressing aheterologous polynucleotide molecule. Host cells of the presentinvention express polynucleotides encoding AviIII or a fragment thereof.Examples of suitable host cells useful in the present invention include,but are not limited to, prokaryotic and eukaryotic cells. Specificexamples of such cells include bacteria of the genera Escherichia,Bacillus, and Salmonella, as well as members of the genera Pseudomonas,Streptomyces, and Staphylococcus; fungi, particularly filamentous fungisuch as Trichoderma and Aspergillus, Phanerochaete chrysosporium andother white rot fungi; also other fungi including Fusaria, molds, andyeast including Saccharomyces sp., Pichia sp., and Candida sp. and thelike; plants e.g. Arabidopsis, cotton, barley, tobacco, potato, andaquatic plants and the like; SF9 insect cells (Summers and Smith, 1987,Texas Agriculture Experiment Station Bulletin, 1555), and the like.Other specific examples include mammalian cells such as human embyonickidney cells (293 cells), Chinese hamster ovary (CHO) cells (Puck etal., 1958, Proc. Natl. Acad. Sci. USA 60, 1275-1281), human cervicalcarcinoma cells (HELA) (ATCC CCL 2), human liver cells (Hep G2) (ATCCHB8065), human breast cancer cells (MCF-7) (ATCC HTB22), human coloncarcinoma cells (DLD-1) (ATCC CCL 221), Daudi cells (ATCC CRL-213),murine myeloma cells such as P3/NSI/1-Ag4-1 (ATCC TIB-18), P3X63Ag8(ATCC TIB-9), SP2/0-Ag14 (ATCC CRL-1581) and the like.

[0035] “Hybridization” refers to the pairing of complementarypolynucleotides during an annealing period. The strength ofhybridization between two polynucleotide molecules is impacted by thehomology between the two molecules, stringency of the conditionsinvolved, the melting temperature of the formed hybrid and the G:C ratiowithin the polynucleotides.

[0036] “Identity” refers to a comparison between pairs of nucleic acidor amino acid molecules. Methods for determining sequence identity areknown. See, for example, computer programs commonly employed for thispurpose, such as the Gap program (Wisconsin Sequence Analysis Package,Version 8 for Unix, Genetics Computer Group, University Research Park,Madison Wis.), that uses the algorithm of Smith and Waterman, 1981, Adv.Appl. Math., 2: 482-489.

[0037] “Isolated” refers to a polynucleotide or polypeptide that hasbeen separated from at least one contaminant (polynucleotide orpolypeptide) with which it is normally associated. For example, anisolated polynucleotide or polypeptide is in a context or in a form thatis different from that in which it is found in nature.

[0038] “Nucleic acid sequence” refers to the order or sequence ofdeoxyribonucleotides along a strand of deoxyribonucleic acid. The orderof these deoxyribonucleotides determines the order of amino acids alonga polypeptide chain. The deoxyribonucleotide sequence thus codes for theamino acid sequence.

[0039] “Polynucleotide” refers to a linear sequence of nucleotides. Thenucleotides may be ribonucleotides, or deoxyribonucleotides, or amixture of both. Examples of polynucleotides in the context of thepresent invention include single and double stranded DNA, single anddouble stranded RNA, and hybrid molecules having mixtures of single anddouble stranded DNA and RNA. The polynucleotides of the presentinvention may contain one or more modified nucleotides.

[0040] “Protein,” “peptide,” and “polypeptide” are used interchangeablyto denote an amino acid polymer or a set of two or more interacting orbound amino acid polymers.

[0041] “Purify,” or “purified” refers to a target protein that is freefrom at least 5-10% of contaminating proteins. Purification of a proteinfrom contaminating proteins can be accomplished using known techniques,including ammonium sulfate or ethanol precipitation, acid precipitation,heat precipitation, anion or cation exchange chromatography,phosphocellulose chromatography, hydrophobic interaction chromatography,affinity chromatography, hydroxylapatite chromatography, size-exclusionchromatography, and lectin chromatography. Various protein purificationtechniques are illustrated in Current Protocols in Molecular Biology,Ausubel et al., eds. (Wiley & Sons, New York, 1988, and quarterlyupdates).

[0042] “Selectable marker” refers to a marker that identifies a cell ashaving undergone a recombinant DNA or RNA event. Selectable markersinclude, for example, genes that encode antimetabolite resistance suchas the DHFR protein that confers resistance to methotrexate (Wigler etal, 1980, Proc Natl Acad Sci USA 77:3567; O'Hare et al., 1981, Proc NatlAcad Sci USA, 78:1527), the GPT protein that confers resistance tomycophenolic acid (Mulligan & Berg, 1981, PNAS USA, 78:2072), theneomycin resistance marker that confers resistance to the aminoglycosideG-418 (Calberre-Garapin et al., 1981, J Mol Biol, 150:1), the Hygroprotein that confers resistance to hygromycin (Santerre et al., 1984,Gene 30:147), and the Zeocin™ resistance marker (Invitrogen). Inaddition, the herpes simplex virus thymidine kinase,hypoxanthine-guanine phosphoribosyltransferase and adeninephosphoribosyltransferase genes can be employed in tk⁻, hgprt⁻ and aprt⁻cells, respectively.

[0043] “Stringency” refers to the conditions (temperature, ionicstrength, solvents, etc) under which hybridization betweenpolynucleotides occurs. A hybridzation reaction conducted under highstringency conditions is one that will only occur between polynucleotidemolecules that have a high degree of complementary base pairing (85% to100% identity). Conditions for high stringency hybridization, forexample, may include an overnight incubation at about 42° C. for about2.5 hours in 6×SSC/0.1% SDS, followed by washing of the filters in1.0×SSC at 65° C, 0.1% SDS. A hybridization reaction conducted undermoderate stringency conditions is one that will occur betweenpolynucleotide molecules that have an intermediate degree ofcomplementary base pairing (50% to 84% identity).

[0044] “Substrate targeting moiety” refers to any signal on a substrate,either naturally occurring or genetically engineered, used to target anyAviIII polypeptide or fragment thereof to a substrate. Such targetingmoieties include ligands that bind to a substrate structure. Examples ofligand/receptor pairs include carbohydrate binding domains andcellulose. Many such substrate-specific ligands are known and are usefulin the present invention to target a AviIII polypeptide or fragmentthereof to a substrate. A novel example is a AviIII carbohydrate bindingdomain that is used to tether other molecules to a cellulose-containingsubstrate such as a fabric.

[0045] “Thermal tolerant” refers to the property of withstanding partialor complete inactivation by heat and can also be described as thermalresistance or thermal stability. Although some variation exists in theliterature, the following definitions can be considered typical for theoptimum temperature range of stability and activity for enzymes:psycrophilic (below freezing to 10 C.); mesophilic (10° C. to 50° C.);thermophilic (50° C. to 75° C.); and caldophilic (75° C. to aboveboiling water temperature). The stability and catalytic activity ofenzymes are linked characteristics, and the ways of measuring theseproperties vary considerably. For industrial enzymes, stability andactivity are best measured under use conditions, often in the presenceof substrate. Therefore, cellulases that must act on process streams ofcellulose must be able to withstand exposure up to thermophilic or evencaldophilic temperatures for digestion times in excess of several hours.

[0046] In encompassing a wide variety of potential applications forembodiments of the present invention, thermal tolerance refers to theability to function in a temperature range of from about 15° C. to about100° C. A preferred range is from about 30° C. to about 80° C. A highlypreferred range is from about 50° C. to about 70° C. For example, aprotein that can function at about 45° C. is considered in the preferredrange even though it may be susceptible to partial or completeinactivation at temperatures in a range above about 45° C. and less thanabout 80° C. For polypeptides derived from organisms such asAcidothermus, the desirable property of thermal tolerance among is oftenaccompanied by other desirable characteristics such as: resistance toextreme pH degradation, resistance to solvent degradation, resistance toproteolytic degradation, resistance to detergent degradation, resistanceto oxidizing agent degradation, resistance to chaotropic agentdegradation, and resistance to general degradation. Cowan D A in DansonM J et al. (1992) The Archaebacteria, Biochemistry and Biotechnology at149-159, University Press, Cambridge, ISBN 1855780100. Here ‘resistance’is intended to include any partial or complete level of residualactivity. When a polypeptide is described as thermal tolerant it isunderstood that any one, more than one, or none of these other desirableproperties can be present.

[0047] “Variant”, as used herein, means a polynucleotide or polypeptidemolecule that differs from a reference molecule. Variants can includenucleotide changes that result in amino acid substitutions, deletions,fusions, or truncations in the resulting variant polypeptide whencompared to the reference polypeptide.

[0048] “Vector,” “extra-chromosomal vector” or “expression vector”refers to a first polynucleotide molecule, usually double-stranded,which may have inserted into it a second polynucleotide molecule, forexample a foreign or heterologous polynucleotide. The heterologouspolynucleotide molecule may or may not be naturally found in the hostcell, and may be, for example, one or more additional copy of theheterologous polynucleotide naturally present in the host genome. Thevector is adapted for transporting the foreign polynucleotide moleculeinto a suitable host cell. Once in the host cell, the vector may becapable of integrating into the host cell chromosomes. The vector mayoptionally contain additional elements for selecting cells containingthe integrated polynucleotide molecule as well as elements to promotetranscription of mRNA from transfected DNA. Examples of vectors usefulin the methods of the present invention include, but are not limited to,plasmids, bacteriophages, cosmids, retroviruses, and artificialchromosomes.

[0049] Within the application, unless otherwise stated, the techniquesutilized may be found in any of several well-known references, such as:Molecular Cloning: A Laboratory Manual (Sambrook et al. (1989) Molecularcloning: A Laboratory Manual), Gene Expression Technology (Methods inEnzymology, Vol. 185, edited by D. Goeddel, 1991 Academic Press, SanDiego, Calif.), “Guide to Protein Purification” in Methods in Enzymology(M. P. Deutshcer, 3d., (1990) Academic Press, Inc.), PCR Protocols: AGuide to Methods and Applications (Innis et al. (1990) Academic Press,San Diego, Calif.), Culture of Animal Cells: A Manual of BasicTechnique, 2^(nd) ed. (R. I. Freshney (1987) Liss, Inc., New York,N.Y.), and Gene Transfer and Expression Protocols, pp 109-128, ed. E. J.Murray, The Humana Press Inc., Clifton, N.J.).

[0050] O-Glycoside Hydrolases:

[0051] Glycoside hydrolases are a large and diverse family of enzymesthat hydrolyse the glycosidic bond between two carbohydrate moieties orbetween a carbohydrate and a non-carbohydrate moiety (See FIG. 2).Glycoside hydrolase enzymes are classified into glycoside hydrolase (GH)families based on significant amino acid similarities within theircatalytic domains. Enzymes having related catalytic domains are groupedtogether within a family, (Henrissat et al., (1991) supra, and Henrissatet al. (1996), Biochem. J. 316:695-696), where the underlyingclassification provides a direct relationship between the GH domainamino acid sequence and how a GH domain will fold. This informationultimately provides a common mechanism for how the enzyme will hydrolysethe glycosidic bond within a substrate, i.e., either by a retainingmechanism or inverting mechanism (Henrissat., B, (1991) supra).

[0052] Cellulases belong to the GH family of enzymes. Cellulases areproduced by a variety of bacteria and fungi to degrade the β-1,4glycosidic bond of cellulose and to so produce successively smallerfragments of cellulose and ultimately produce glucose. At present,cellulases are found within are at least 11 different GH families. Threedifferent types of cellulase enzyme activities have been identifiedwithin these GH families: exo-acting cellulases which cleave successivedisaccharide units from the non-reducing ends of a cellulose chain;endo-acting cellulases which randomly cleave successive disaccharideunits within the cellulose chain; and β-glucosidases which cleavesuccessive disaccharide units to glucose (J. W. Deacon, (1997) ModemMycology, 3rd Ed., ISBN: 0-632-03077-1, 97-98).

[0053] Many cellulases are characterized by having a multiple domainunit within their overall structure, a GH or catalytic domain is joinedto a carbohydrate-binding domain (CBD) by a glycosylated linker peptide(Koivula et al., (1996) Protein Expression and Purification 8:391-400).As noted above, cellulases do not belong to any one family of GHdomains, but rather have been identified within at least 11 different GHfamilies to date. The CBD type domain increases the concentration of theenzyme on the substrate, in this case cellulose, and the linker peptideprovides flexibility for both larger domains.

[0054] Conversion of cellulose to glucose is an essential step in theproduction of ethanol or other biofuels from biomass. Cellulases are animportant component of this process, where approximately one kilogram ofcellulase can digest fifty kilograms of cellulose. Within this process,thermostable cellulases have taken precedent, due to their ability tofunction at elevated temperatures and under other conditions includingpH extremes, solvent presence, detergent presence, proteolysis, etc.(see Cowan D A (1992), supra).

[0055] Highly thermostable cellulase enzymes are secreted by thecellulolytic themophile Acidothermus cellulolyticus (U.S. Pat. Nos.5,275,944 and 5,110,735). This bacterium was originally isolated fromdecaying wood in an acidic, thermal pool at Yellowstone National Parkand deposited with the American Type Culture Collection (ATCC 43068)(Mohagheghi et al., (1986) Int. J. System. Bacteriol., 36:435-443).

[0056] Recently, a thermostable cellulase, E1 endoglucanase, wasidentified and characterized from Acidothermus cellulolyticus (U.S. Pat.No. 5,536,655). The E1 endoglucanase has maximal activity between 75 and83° C. and is active to a pH well below 5. Thermostable cellulase, andE1 endoglucanase, are useful in the conversion of biomass to biofuels,and in particular, are useful in the conversion of cellulose to glucose.Conversion of biomass to biofuel represents an extremely importantalternative fuel source that is more environmentally friendly thanconventional fuels, and provides a use, in some cases, for wasteproducts.

[0057] AviIII:

[0058] As described more fully in the Examples below, AviIII, a novelthermostable cellulase, has now been identified and characterized. Thepredicted amino acid sequence of AviIII (SEQ ID NO: 1) has anorganization characteristic of a cellulase enzyme. AviIII contains acarbohydrate binding domain-linker domain-catalytic domain-linkerdomain-fibronectin domain-linker domain-carbohydrate binding domainunit. In particular, AviIII includes a carbohydrate binding domain typeIII (CBDIII) (amino acids from about A35 to about A187), a GH74catalytic domain (amino acids from about N231 to about P870), and aCBD_(II) (amino acids from about G1021 to about S1121).

[0059] As discussed in more detail below (Example 2), significant aminoacid similarity of AviIII to other cellulases identifies AviIII as acellulase. In addition, the predicted amino acid sequence (SEQ ID NO: 1)indicates that a CBD type III domain is present as characterized byTomme P. et al. (1995), in Enzymatic Degradation of InsolublePolysaccharides (Saddler J N & Penner M, eds.), at 142-163, AmericanChemical Society, Washington. See also Tomme, P. & Claeyssens, M. (1989)FEBS Lett. 243, 239-2431; Gilkes, N. R et al., (1988) J. Biol. Chem.263, 10401-10407.

[0060] AviIII, as noted above, has a catalytic domain, identified asbelonging to the GH74 family. The GH74 domain family includes a numberof exoglucanases, for example, from Cellulomonas fimi, and exoglucanaseE3 isolated from Thermobifida fusca. The GH74 members degrade substrateusing an inverting mechanism. Being a member of the GH74 family ofproteins identifies AviIII as potentially having cellulase activity.

[0061] AviIII is also a thermostable cellulase as it is produced by thethemophile Acidothermus cellulolyticus. As discussed, AviIIIpolypeptides can have other desirable characteristics (see Cowan D A(1992), supra). Like other members of the cellulase family, and inparticular thermostable cellulases, AviIII polypeptides are useful inthe conversion of biomass to biofuels and biofuel additives, and inparticular, biofuels from cellulose. It is envisioned that AviIIIpolypeptides could be used for other purposes, for example indetergents, pulp and paper processing, food and feed processing, and intextile processes. AviIII polypeptides can be used alone or incombination with one or more other cellulases or glycoside hydrolases toperform the uses described herein or known within the relevant art, allof which are within the scope of the present disclosure.

[0062] AviIII Polypeptides:

[0063] AviIII polypeptides of the invention include isolatedpolypeptides having an amino acid sequence as shown below in Example 1;Table 1 and in SEQ ID NO: 1, as well as variants and derivatives,including fragments, having substantial identity to the amino acidsequence of SEQ ID NO: 1 and that retain any of the functionalactivities of AviIII. AviIII polypeptide activity can be determined, forexample, by subjecting the variant, derivative, or fragment to asubstrate binding assay or a cellulase activity assay such as thosedescribed in Irwin D et al., J. Bacteriology 180(7): 1709-1714 (April1998). TABLE 1 AviIII Amino Acid sequence. (SEQ ID NO: 1)MDRSENIRLTMRSRRLVSLLAATASFAVAAALGVLPIAITASPAHAATTQPYTWSNVAIGGGGFVDGIVFNEGAPGILYVRTDIGGMYRWDAANGRWIPLLDWVGWNNWGYNGVVSIAADPINTNKVWAAVGMYTNSWDPNDGAILRSSDQGATWQITPLPFKLGGNMPGRGMGERLAVdPNNdNILYFGAPSGKGLWRSTDSGATWSQMTNFPDVGTYIANPTDTTGYQSDIQGVVWVAFDKSSSSLGQASKTIFVGVADPNNPVFWSRDGGATWQAVPGAPTGFIPHKGVFDPVNHVLYIATSNTGGPYDGSSGDVWKFSVTSGTWTRISPVPSTDTANDYFGYSGLTIDRQHPNTIMVATQISWWPDTIIFRSTDGGATWTRIWDWTSYPNRSLRYVLDISAEPWLTFGVQPNPPVPSPKLGWMDEAMAIDPFNSDRMLYGTGATLYATNDLTKWDSGGQIHIAPMVKGLEETAVNDLISPPSGAPLISALGDLGGFTHADVTAVPSTIFTSPVFTTGTSVDYAELNPSIIVRAGSFDPSSQPNDRHVAFSTDGGKNWFQGSEPGGVTTGGTVAASADGSRFVWAPGDPGQPVVYAVGFGNSWAASQGVPANAQIRSDRVNPKTFYALSNGTFYRSTDGGVTFQPVAAGLPSSGAVGVMFHAVPGKEGDLWLAASSGLYHSTNGGSSWSAITGVSSAVNVGFGKSAPGSSYPAVFVVGTIGGVTGAYRSDDCGTTWVLINDDQHQYGNWGQAITGDHANLRRVYIGTNGRGIVYGDIGGAPSGSPSPSVSPSASPSLSPSPSPSSSPSPSPSPSSSPSSSPSPSPSPSPSPSRSPSPSASPSPSSSPSPSSSPSSSPSPTPSSSPVSGGVKVQYKNNDSAPGDNQIKPGLQVVNTGSSSVDLSTVTVRYWFTRDGGSSTLVYNCDWAAIGCGNIRASFGSVNPATPT ADTYLQX*

[0064] As listed and described in Tables 1 and 5, the isolated AviIIIpolypeptide includes an N-terminal hydrophobic region that functions asa signal peptide, having an amino acid sequence that begins with Met1and extends to about A34; a carbohydrate binding domain having sequencesimilarity to such type III domains that begins with about A35 andextends to about A187, a catalytic domain having significant sequencesimilarity to a GH74 family domain that begins with about N231 andextends to about P870, a fibronectin type III domain that begins withabout D901 and extends to about G985, a carbohydrate binding domain typeII region that begins with about G1021 and extends to about S1121.Variants and derivatives of AviIII include, for example, AviIIIpolypeptides modified by covalent or aggregative conjugation with otherchemical moieties, such as glycosyl groups, polyethylene glycol (PEG)groups, lipids, phosphate, acetyl groups, and the like.

[0065] The amino acid sequence of AviIII polypeptides of the inventionis preferably at least about 60% identical, more preferably at leastabout 70% identical, or in some embodiments at least about 90%identical, to the AviIII amino acid sequence shown above in Table 1 andSEQ ID NO:1. The percentage identity, also termed homology (seedefinition above) can be readily determined, for example, by comparingthe two polypeptide sequences using any of the computer programscommonly employed for this purpose, such as the Gap program (WisconsinSequence Analysis Package, Version 8 for Unix, Genetics Computer Group,University Research Park, Madison Wis.), which uses the algorithm ofSmith and Waterman, 1981, Adv. Appl. Math. 2: 482-489.

[0066] Variants and derivatives of the AviIII polypeptide may furtherinclude, for example, fusion proteins formed of a AviIII polypeptide anda heterologous polypeptide. Preferred heterologous polypeptides includethose that facilitate purification, oligomerization, stability, orsecretion of the AviIII polypeptides.

[0067] AviIII polypeptide variants and derivatives, as used in thedescription of the invention, can contain conservatively substitutedamino acids, meaning that one or more amino acid can be replaced by anamino acid that does not alter the secondary and/or tertiary structureof the polypeptide. Such substitutions can include the replacement of anamino acid, by a residue having similar physicochemical properties, suchas substituting one aliphatic residue (Ile, Val, Leu, or Ala) foranother, or substitutions between basic residues Lys and Arg, acidicresidues Glu and Asp, amide residues Gln and Asn, hydroxyl residues Serand Tyr, or aromatic residues Phe and Tyr. Phenotypically silent aminoacid exchanges are described more fully in Bowie et al., 1990, Science247:1306-1310. In addition, functional AviIII polypeptide variantsinclude those having amino acid substitutions, deletions, or additionsto the amino acid sequence outside functional regions of the protein,for example, outside the catalytic and carbohydrate binding domains.These would include, for example, the various linker sequences thatconnect functional domains as defined herein.

[0068] The AviIII polypeptides of the present invention are preferablyprovided in an isolated form, and preferably are substantially purified.The polypeptides may be recovered and purified from recombinant cellcultures by known methods, including, for example, ammonium sulfate orethanol precipitation, anion or cation exchange chromatography,phosphocellulose chromatography, hydrophobic interaction chromatography,affinity chromatography, hydroxylapatite chromatography, and lectinchromatography. Preferably, high performance liquid chromatography(HPLC) is employed for purification.

[0069] Another preferred form of AviIII polypeptides is that ofrecombinant polypeptides as expressed by suitable hosts. Furthermore,the hosts can simultaneously produce other cellulases such that amixture is produced comprising a AviIII polypeptide and one or moreother cellulases. Such a mixture can be effective in crude fermentationprocessing or other industrial processing.

[0070] AviIII polypeptides can be fused to heterologous polypeptides tofacilitate purification. Many available heterologous peptides (peptidetags) allow selective binding of the fusion protein to a bindingpartner. Non-limiting examples of peptide tags include 6-His,thioredoxin, hemaglutinin, GST, and the OmpA signal sequence tag. Abinding partner that recognizes and binds to the heterologous peptidecan be any molecule or compound, including metal ions (for example,metal affinity columns), antibodies, antibody fragments, or any proteinor peptide that preferentially binds the heterologous peptide to permitpurification of the fusion protein.

[0071] AviIII polypeptides can be modified to facilitate formation ofAviIII oligomers. For example, AviIII polypeptides can be fused topeptide moieties that promote oligomerization, such as leucine zippersand certain antibody fragment polypeptides, for example, Fcpolypeptides. Techniques for preparing these fusion proteins are known,and are described, for example, in WO 99/31241 and in Cosman et. al.,2001 Immunity 14:123-133. Fusion to an Fc polypeptide offers theadditional advantage of facilitating purification by affinitychromatography over Protein A or Protein G columns. Fusion to aleucine-zipper (LZ), for example, a repetitive heptad repeat, often withfour or five leucine residues interspersed with other amino acids, isdescribed in Landschultz et al., 1988, Science, 240:1759.

[0072] It is also envisioned that an expanded set of variants andderivatives of AviIII polynucleotides and/or polypeptides can begenerated to select for useful molecules, where such expansion isachieved not only by conventional methods such as site-directedmutagenesis (SDM) but also by more modem techniques, eitherindependently or in combination.

[0073] Site-directed-mutagenesis is considered an informational approachto protein engineering and can rely on high-resolution crystallographicstructures of target proteins and some stratagem for specific amino acidchanges (Van Den Burg, B.; Vriend, G.; Veltman, O. R.; Venema, G.;Eijsink, V. G. H. Proc. Nat. Acad. Sci. U.S. 1998, 95, 2056-2060). Forexample, modification of the amino acid sequence of AviIII polypeptidescan be accomplished as is known in the art, such as by introducingmutations at particular locations by oligonucleotide-directedmutagenesis (Walder et al.,1986, Gene, 42:133; Bauer et al., 1985, Gene37:73; Craik, 1985, BioTechniques, 12-19; Smith et al., 1981, GeneticEngineering: Principles and Methods, Plenum Press; and U.S. Pat. No.4,518,584 and U.S. Pat. No. 4,737,462). SDM technology can also employthe recent advent of computational methods for identifying site-specificchanges for a variety of protein engineering objectives (Hellinga, H. W.Nature Structural. Biol. 1998, 5, 525-527).

[0074] The more modern techniques include, but are not limited to,non-informational mutagenesis techniques (referred to generically as“directed evolution”). Directed evolution, in conjunction withhigh-throughput screening, allows testing of statistically meaningfulvariations in protein conformation (Arnold, F. H. Nature Biotechnol.1998, 16, 617-618). Directed evolution technology can includediversification methods similar to that described by Crameri A. et al.(1998, Nature 391: 288-291), site-saturation mutagenesis, staggeredextension process (StEP) (Zhao, H.; Giver, L.; Shao, Z.; Affholter, J.A.; Arnold, F. H. Nature Biotechnol. 1998, 16, 258-262), and DNAsynthesis/reassembly (U.S. Pat. No. 5,965,408).

[0075] Fragments of the AviIII polypeptide can be used, for example, togenerate specific anti-AviIII antibodies. Using known selectiontechniques, specific epitopes can be selected and used to generatemonoclonal or polyclonal antibodies. Such antibodies have utlilty in theassay of AviIII activity as well as in purifying recombinant AviIIIpolypeptides from genetically engineered host cells.

[0076] AviIII Polynucleotides:

[0077] The invention also provides polynucleotide molecules encoding theAviIII polypeptides discussed above. AviIII polynucleotide molecules ofthe invention include polynucleotide molecules having the nucleic acidsequence shown in Table 2 and SEQ ID NO: 2, polynucleotide moleculesthat hybridize to the nucleic acid sequence of Table 2 and SEQ ID NO:2under high stringency hybridization conditions (for example, 42°, 2.5hr., 6×SCC, 0.1%SDS); and polynucleotide molecules having substantialnucleic acid sequence identity with the nucleic acid sequence of Table 2and SEQ ID NO:2, particularly with those nucleic acids encoding thecatalytic domain, GH74 (from about amino acid A37 to about G776), thecarbohydrate binding domain III (from about amino acid V859 to about atleast Q946). TABLE 2 AvIII nucleotide sequence. (SEQ ID NO: 2)ATGGATCGTTCGGAGAACATCCGTCTGACTATGAGATCACGACGATTGGTATCACTGCTCGCCGCCACTGCGTCGTTCGCCGTGGCCGCCGCTCTGGGAGTTCTGCCCATCGCGATAACGGCTTCTCCTGCGCACGCGGCGACGACTCAGCCGTACACCTGGAGCAACGTGGCGATCGGGGGCGGCGGCTTTGTCGACGGGATCGTCTTCAATGAAGGTGCACCGGGAATTCTGTACGTGCGGACGGACATCGGGGGGATGTATCGATGGGATGCCGCCAACGGGCGGTGGATCCCTCTTCTGGATTGGGTGGGATGGAACAATTGGGGGTACAACGGCGTCGTCAGCATTGCGGCAGACCCGATCAATACTAACAAGGTATGGGCCGCCGTCGGAATGTACACCAACAGCTGGGACCCAAACGACGGAGCGATTCTCCGCTCGTCTGATCAGGGCGCAACGTGGCAAATAACGCCCCTGCCGTTCAAGCTTGGCGGCAACATGCCCGGGCGTGGAATGGGCGAGCGGCTTGCGGTGGATCCAAACAATGACAACATTCTGTATTTCGGCGCCCCGAGCGGCAAAGGGCTCTGGAGAAGCACAGATTCCGGCGCGACCTGGTCCCAGATGACGAACTTTCCGGACGTAGGCACGTACATTGCAAATCCCACTGACACGACCGGCTATCAGAGCGATATTCAAGGCGTCGTCTGGGTCGCTTTCGACAAGTCTTCGTCATCGCTCGGGCAAGCGAGTAAGACCATTTTTGTGGGCGTGGCGGATCCCAATAATCCGGTCTTCTGGAGCAGAGACGGCGGCGCGACGTGGCAGGCGGTGCCGGGTGCGCCGACCGGCTTCATCCCGCACAAGGGCGTCTTTGACCCGGTCAACCACGTGCTCTATATTGCCACCAGCAATACGGGTGGTCCGTATGACGGGAGCTCCGGCGACGTCTGGAAATTCTCGGTGACCTCCGGGACATGGACGCGAATCAGCCCGGTACCTTCGACGGACACGGCCAACGACTACTTTGGTTACAGCGGCCTCACTATCGACCGCCAGCACCCGAACACGATAATGGTGGCAACCCAGATATCGTGGTGGCCGGACACCATAATCTTTCGGAGCACCGACGGCGGTGCGACGTGGACGCGGATCTGGGATTGGACGAGTTATCCCAATCGAAGCTTGCGATATGTGCTTGACATTTCGGCGGAGCCTTGGCTGACCTTCGGCGTACAGCCGAATCCTCCCGTACCCAGTCCGAAGCTCGGCTGGATGGATGAAGCGATGGCAATCGATCCGTTCAACTCTGATCGGATGCTCTACGGAACAGGCGCGACGTTGTACGCAACAAATGATCTCACGAAGTGGGACTCCGGCGGCCAGATTCATATCGCGCCGATGGTCAAAGGATTGGAGGAGACGGCGGTAAACGATCTCATCAGCCCGCCGTCTGGCGCCCCGCTCATCAGCGCTCTCGGAGACCTCGGCGGCTTCACCCACGCCGACGTTACTGCCGTGCCATCGACGATCTTCACGTCACCGGTGTTCACGACCGGCACCAGCGTCGACTATGCGGAATTGAATCCGTCGATCATCGTTCGCGCTGGAAGTTTCGATCCATCGAGCCAACCGAACGACAGGCACGTCGCGTTCTCGACAGACGGCGGCAAGAACTGGTTCCAAGGCAGCGAACCTGGCGGGGTGACGACGGGCGGCACCGTCGCCGCATCGGCCGACGGCTCTCGTTTCGTCTGGGCTCCCGGCGATCCCGGTCAGCCTGTGGTGTACGCAGTCGGATTTGGCAACTCCTGGGCTGCTTCGCAAGGTGTTCCCGCCAATGCCCAGATCCGCTCAGACCGGGTGAATCCAAAGACTTTCTATGCCCTATCCAATGGAACCTTCTATCGAAGCACGGACGGCGGCGTGACATTCCAACCGGTCGCGGCCGGTCTTCCGAGCAGCGGTGCCGTCGGTGTCATGTTCCACGCGGTGCCTGGAAAAGAAGGCGATCTGTGGCTCGCTGCATCGAGCGGGCTTTACCACTCAACCAATGGCGGCAGCAGTTGGTCTGCAATCACCGGCGTATCCTCCGCGGTGAACGTGGGATTTGGTAAGTCTGCGCCCGGGTCGTCATACCCAGCCGTCTTTGTCGTCGGCACGATCGGAGGCGTTACGGGGGCGTACCGCTCCGACGACTGTGGGACGACCTGGGTACTGATCAATGATGACCAGCACCAATACGGAAATTGGGGACAAGCAATCACCGGTGACCACGCGAATTTACGGCGGGTGTACATAGGCACGAACGGCCGTGGAATTGTATACGGGGACATTGGTGGTGCGCCGTCCGGATCGCCGTCTCCGTCGGTGAGTCCGTCGGCTTCGCCGAGCCTGAGCCCGAGCCCGAGCCCGAGCAGCTCGCCATCGCCGTCGCCGTCGCCGAGCTCGAGTCCATCCTCGTCGCCGTCTCCGTCGCCGTCACCATCGCCGAGTCCGTCTCGGTCTCCGTCACCATCGOCGTCGCCGAGCCCGTCTTCGTCACCGAGCCCGTCTTCGTCACCGTCTTCGTCGCCGAGCCCAACGCCGTCGTCGTCGCCGGTGTCGGGTGGGGTGAAGGTGCAGTATAAGAATAATGATTCGGCGCCGGGTGATAATCAGATCAAGCCGGGTTTGCAGGTGGTGAATACCGGGTCGTCGTCGGTGGATTTGTCGACGGTGACGGTGCGGTACTGGTTCACCCGGGATGGTGGCTCGTCGACACTGGTGTACAACTGTGACTGGGCGGCGATCGGGTGTGGGAATATCCGCGCCTCGTTCGGCTCGGTGAACCCGGCGACGCCGACGGCGGACACCTACCTGCAGN*

[0078] The AviIII polynucleotide molecules of the invention arepreferably isolated molecules encoding the AviIII polypetide having anamino acid sequence as shown in Table 1 and SEQ ID NO: 1, as well asderivatives, variants, and useful fragments of the AviIIIpolynucleotide. The AviIII polynucleotide sequence can includedeletions, substitutions, or additions to the nucleic acid sequence ofTable 2 and SEQ ID NO: 1.

[0079] The AviIII polynucleotide molecule of the invention can be cDNA,chemically synthesized DNA, DNA amplified by PCR, RNA, or combinationsthereof. Due to the degeneracy of the genetic code, two DNA sequencesmay differ and yet encode identical amino acid sequences. The presentinvention thus provides an isolated polynucleotide molecule having aAviIII nucleic acid sequence encoding AviIII polypeptide, where thenucleic acid sequenc encodes a polypeptide having the complete aminoacid sequences as shown in Table 1 and SEQ ID NO: 1 , or variants,derivatives, and fragments thereof.

[0080] The AviIII polynucleotides of the invention have a nucleic acidsequence that is at least about 60% identical to the nucleic acidsequence shown in Table 2 and SEQ ID NO: 2, in some embodiments at leastabout 70% identical to the nucleic acid sequence shown in Table 2 andSEQ ID NO: 2, and in other embodiments at least about 90% identical tothe nucleic acid sequence shown in Table 2 and SEQ ID NO: 2. Nucleicacid sequence identity is determined by known methods, for example byaligning two sequences in a software program such as the BLAST program(Altschul, S. F et al. (1990) J. Mol. Biol. 215:403-410, from theNational Center for Biotechnology Information(http://www.ncbi.nlm.nih.gov/BLAST/).

[0081] The AviIII polynucleotide molecules of the invention also includeisolated polynucleotide molecules having a nucleic acid sequence thathybridizes under high stringency conditions (as defined above) to a thenucleic acid sequence shown in Table 2 and SEQ ID NO: 2. Hybridizationof the polynucleotide is to about 15 contiguous nucleotides, or about 20contiguous nucleotides, and in other embodiments about 30 contiguousnucleotides, and in still other embodiments about 100 contiguousnucleotides of the nucleic acid sequence shown in Table 2 and SEQ ID NO:2.

[0082] Useful fragments of the AviIII-encoding polynucleotide moleculesdescribed herein, include probes and primers. Such probes and primerscan be used, for example, in PCR methods to amplify and detect thepresence of AviIII polynucleotides in vitro, as well as in Southern andNorthern blots for analysis of AviIII. Cells expressing the AviIIIpolynucleotide molecules of the invention can also be identified by theuse of such probes. Methods for the production and use of such primersand probes are known. For PCR, 5′ and 3′ primers corresponding to aregion at the termini of the AviIII polynucleotide molecule can beemployed to isolate and amplify the AviIII polynucleotide usingconventional techniques.

[0083] Other useful fragments of the AviIII polynucleotides includeantisense or sense oligonucleotides comprising a single-stranded nucleicacid sequence capable of binding to a target AviIII mRNA (using a sensestrand), or DNA (using an antisense strand) sequence.

[0084] Vectors and Host Cells:

[0085] The present invention also provides vectors containing thepolynucleotide molecules of the invention, as well as host cellstransformed with such vectors. Any of the polynucleotide molecules ofthe invention may be contained in a vector, which generally includes aselectable marker and an origin of replication, for propagation in ahost. The vectors further include suitable transcriptional ortranslational regulatory sequences, such as those derived from amammalian, microbial, viral, or insect genes, operably linked to theAviIII polynucleotide molecule. Examples of such regulatory sequencesinclude transcriptional promoters, operators, or enhancers, mRNAribosomal binding sites, and appropriate sequences which controltranscription and translation. Nucleotide sequences are operably linkedwhen the regulatory sequence functionally relates to the DNA encodingthe target protein. Thus, a promoter nucleotide sequence is operablylinked to a AviIII DNA sequence if the promoter nucleotide sequencedirects the transcription of the AviIII sequence.

[0086] Selection of suitable vectors for the cloning of AviIIIpolynucleotide molecules encoding the target AviIII polypeptides of thisinvention will depend upon the host cell in which the vector will betransformed, and, where applicable, the host cell from which the targetpolypeptide is to be expressed. Suitable host cells for expression ofAviIII polypeptides include prokaryotes, yeast, and higher eukaryoticcells, each of which is discussed below.

[0087] The AviIII polypeptides to be expressed in such host cells mayalso be fusion proteins that include regions from heterologous proteins.As discussed above, such regions may be included to allow, for example,secretion, improved stability, or facilitated purification of the AviIIIpolypeptide. For example, a nucleic acid sequence encoding anappropriate signal peptide can be incorporated into an expressionvector. A nucleic acid sequence encoding a signal peptide (secretoryleader) may be fused in-frame to the AviIII sequence so that AviIII istranslated as a fusion protein comprising the signal peptide. A signalpeptide that is functional in the intended host cell promotesextracellular secretion of the AviIII polypeptide. Preferably, thesignal sequence will be cleaved from the AviIII polypeptide uponsecretion of AviIII from the cell. Non-limiting examples of signalsequences that can be used in practicing the invention include the yeastI-factor and the honeybee melatin leader in Sf9 insect cells.

[0088] Suitable host cells for expression of target polypeptides of theinvention include prokaryotes, yeast, and higher eukaryotic cells.Suitable prokaryotic hosts to be used for the expression of thesepolypeptides include bacteria of the genera Escherichia, Bacillus, andSalmonella, as well as members of the genera Pseudomonas, Streptomyces,and Staphylococcus. For expression in prokaryotic cells, for example, inE. coli, the polynucleotide molecule encoding AviIII polypeptidepreferably includes an N-terminal methionine residue to facilitateexpression of the recombinant polypeptide. The N-terminal Met mayoptionally be cleaved from the expressed polypeptide.

[0089] Expression vectors for use in prokaryotic hosts generallycomprise one or more phenotypic selectable marker genes. Such genesencode, for example, a protein that confers antibiotic resistance orthat supplies an auxotrophic requirement. A wide variety of such vectorsare readily available from commercial sources. Examples include pSPORTvectors, pGEM vectors (Promega, Madison, Wis.), pPROEX vectors (LTI,Bethesda, Md.), Bluescript vectors (Stratagene), and pQE vectors(Qiagen).

[0090] AviIII can also be expressed in yeast host cells from generaincluding Saccharomyces, Pichia, and Kluveromyces. Preferred yeast hostsare S. cerevisiae and P. pastoris. Yeast vectors will often contain anorigin of replication sequence from a 2T yeast plasmid, an autonomouslyreplicating sequence (ARS), a promoter region, sequences forpolyadenylation, sequences for transcription termination, and aselectable marker gene. Vectors replicable in both yeast and E. coli(termed shuttle vectors) may also be used. In addition to theabove-mentioned features of yeast vectors, a shuttle vector will alsoinclude sequences for replication and selection in E. coli. Directsecretion of the target polypeptides expressed in yeast hosts may beaccomplished by the inclusion of nucleotide sequence encoding the yeastI-factor leader sequence at the 5′ end of the AviIII-encoding nucleotidesequence.

[0091] Insect host cell culture systems can also be used for theexpression of AviIII polypeptides. The target polypeptides of theinvention are preferably expressed using a baculovirus expressionsystem, as described, for example, in the review by Luckow and Summers,1988 Bio/Technology 6:47.

[0092] The choice of a suitable expression vector for expression ofAviIII polypeptides of the invention will depend upon the host cell tobe used. Examples of suitable expression vectors for E. coli includepET, pUC, and similar vectors as is known in the art. Preferred vectorsfor expression of the AviIII polypeptides include the shuttle plasmidp11702 for Streptomyces lividans, pGAPZalpha-A, B, C and pPICZalpha-A,B, C (Invitrogen) for Pichia pastoris, and pFE-1 and pFE-2 forfilamentous fungi and similar vectors as is known in the art.

[0093] Modification of a AviIII polynucleotide molecule to facilitateinsertion into a particular vector (for example, by modifiyingrestriction sites), ease of use in a particular expression system orhost (for example, using preferred host codons), and the like, are knownand are contemplated for use in the invention. Genetic engineeringmethods for the production of AviIII polypeptides include the expressionof the polynucleotide molecules in cell free expression systems, incellular hosts, in tissues, and in animal models, according to knownmethods.

[0094] Compositions

[0095] The invention provides compositions containing a substantiallypurified AviIII polypeptide of the invention and an acceptable carrier.Such compositions are administered to biomass, for example, to degradethe cellulose in the biomass into simpler carbohydrate units andultimately, to sugars. These released sugars from the cellulose areconverted into ethanol by any number of different catalysts. Suchcompositions may also be included in detergents for removal, forexample, of cellulose containing stains within fabrics, or compositionsused in the pulp and paper industry, to address conditions associatedwith cellulose content. Compositions of the present invention can beused in stonewashing jeans such as is well known in the art.Compositions can be used in the biopolishing of cellulosic fabrics, suchas cotton, linen, rayon and Lyocell.

[0096] The invention provides pharmaceutical compositions containing asubstantially purified AviIII polypeptide of the invention and ifnecessary a pharmaceutically acceptable carrier. Such pharmaceuticalcompositions are administered to cells, tissues, or patients, forexample, to aid in delivery or targeting of other pharmaceuticalcompositions. For example, AviIII polypeptides may be used wherecarbohydrate-mediated liposomal interactions are involved with targetcells. Vyas S P et al. (2001), J. Pharmacy & Pharmaceutical SciencesMay-August 4(2): 138-58.

[0097] The invention also provides reagents, compositions, and methodsthat are useful for analysis of AviIII activity and for the analysis ofcellulose breakdown.

[0098] Compositions of the present invention may also include otherknown cellulases, and preferably, other known thermal tolerantcellulases for enhanced treatment of cellulose.

[0099] Antibodies

[0100] The polypeptides of the present invention, in whole or in part,may be used to raise polyclonal and monoclonal antibodies that areuseful in purifying AviIII, or detecting AviIII polypeptide expression,as well as a reagent tool for characterizing the molecular actions ofthe AviIII polypeptide. Preferably, a peptide containing a uniqueepitope of the AviIII polypeptide is used in preparation of antibodies,using conventional techniques. Methods for the selection of peptideepitopes and production of antibodies are known. See, for example,Antibodies: A Laboratory Manual, Harlow and Land (eds.), 1988 ColdSpring Harbor Laboratory Press, Cold Spring Harbor, N.Y.; MonoclonalAntibodies, Hybridomas: A New Dimension in Biological Analyses, Kennetet al. (eds.), 1980 Plenum Press, New York.

[0101] Assays

[0102] Agents that modify, for example, increase or decrease, AviIIIhydrolysis or degradation of cellulose can be identified, for example,by assay of AviIII cellulase activity and/or analysis of AviIII bindingto a cellulose substrate. Incubation of cellulose in the presence ofAviIII and in the presence or absence of a test agent and correlation ofcellulase activity or carbohydrate binding permits screening of suchagents. For example, cellulase activity and binding assays may beperformed in a manner similar to those described in Irwin et al., J.Bacteriology 180(7): 1709-1714 (April 1998).

[0103] The AviIII stimulated activity is determined in the presence andabsence of a test agent and then compared. A lower AviIII activated testactivity in the presence of the test agent, than in the absence of thetest agent, indicates that the test agent has decreased the activity ofthe AviIII. A higher AviIII activated test activity in the presence ofthe test agent than in the absence of the test agent indicates that thetest agent has increased the activity of the AviIII. Stimulators andinhibitors of AviIII may be used to augment, inhibit, or modify AviIIImediated activity, and therefore may have potential industrial uses aswell as potential use in the further elucidation of AviIII's molecularactions.

[0104] Therapeutic Applications

[0105] The AviIII polypeptides of the invention are effective in addingin delivery or targeting of other pharmaceutical compositions within ahost. For example, AviIII polypeptides may be used wherecarbohydrate-mediated liposomal interactions are involved with targetcells. Vyas SP et al. (2001), J. Pharm Pharm Sci May-August 4(2):138-58.

[0106] AviIII polynucleotides and polypeptides, including vectorsexpressing AviIII, of the invention can be formulated as pharmaceuticalcompositions and administered to a host, preferably mammalian host,including a human patient, in a variety of forms adapted to the chosenroute of administration. The compounds are preferably administered incombination with a pharmaceutically acceptable carrier, and may becombined with or conjugated to specific delivery agents, includingtargeting antibodies and/or cytokines.

[0107] AviIII can be administered by known techniques, such as orally,parentally (including subcutaneous injection, intravenous,intramuscular, intrasternal or infusion techniques), by inhalationspray, topically, by absorption through a mucous membrane, or rectally,in dosage unit formulations containing conventional non-toxicpharmaceutically acceptable carriers, adjuvants or vehicles.Pharmaceutical compositions of the invention can be in the form ofsuspensions or tablets suitable for oral administration, nasal sprays,creams, sterile injectable preparations, such as sterile injectableaqueous or oleagenous suspensions or suppositories.

[0108] For oral administration as a suspension, the compositions can beprepared according to techniques well-known in the art of pharmaceuticalformulation. The compositions can contain microcrystalline cellulose forimparting bulk, alginic acid or sodium alginate as a suspending agent,methylcellulose as a viscosity enhancer, and sweeteners or flavoringagents. As immediate release tablets, the compositions can containmicrocrystalline cellulose, starch, magnesium stearate and lactose orother excipients, binders, extenders, disintegrants, diluents andlubricants known in the art.

[0109] For administration by inhalation or aerosol, the compositions canbe prepared according to techniques well-known in the art ofpharmaceutical formulation. The compositions can be prepared assolutions in saline, using benzyl alcohol or other suitablepreservatives, absorption promoters to enhance bioavailability,fluorocarbons or other solubilizing or dispersing agents known in theart.

[0110] For administration as injectable solutions or suspensions, thecompositions can be formulated according to techniques well-known in theart, using suitable dispersing or wetting and suspending agents, such assterile oils, including synthetic mono- or diglycerides, and fattyacids, including oleic acid.

[0111] For rectal administration as suppositories, the compositions canbe prepared by mixing with a suitable non-irritating excipient, such ascocoa butter, synthetic glyceride esters or polyethylene glycols, whichare solid at ambient temperatures, but liquefy or dissolve in the rectalcavity to release the drug.

[0112] Preferred administration routes include orally, parenterally, aswell as intravenous, intramuscular or subcutaneous routes. Morepreferably, the compounds of the present invention are administeredparenterally, i.e., intravenously or intraperitoneally, by infusion orinjection.

[0113] Solutions or suspensions of the compounds can be prepared inwater, isotonic saline (PBS) and optionally mixed with a nontoxicsurfactant. Dispersions may also be prepared in glycerol, liquidpolyethylene, glycols, DNA, vegetable oils, triacetin and mixturesthereof. Under ordinary conditions of storage and use, thesepreparations may contain a preservative to prevent the growth ofmicroorganisms.

[0114] The pharmaceutical dosage form suitable for injection or infusionuse can include sterile, aqueous solutions or dispersions or sterilepowders comprising an active ingredient which are adapted for theextemporaneous preparation of sterile injectable or infusible solutionsor dispersions. In all cases, the ultimate dosage form should besterile, fluid and stable under the conditions of manufacture andstorage. The liquid carrier or vehicle can be a solvent or liquiddispersion medium comprising, for example, water, ethanol, a polyol suchas glycerol, propylene glycol, or liquid polyethylene glycols and thelike, vegetable oils, nontoxic glyceryl esters, and suitable mixturesthereof. The proper fluidity can be maintained, for example, by theformation of liposomes, by the maintenance of the required particlesize, in the case of dispersion, or by the use of nontoxic surfactants.The prevention of the action of microorganisms can be accomplished byvarious antibacterial and antifungal agents, for example, parabens,chlorobutanol, phenol, sorbic acid, thimerosal, and the like. In manycases, it will be desirable to include isotonic agents, for example,sugars, buffers, or sodium chloride. Prolonged absorption of theinjectable compositions can be brought about by the inclusion in thecomposition of agents delaying absorption—for example, aluminummonosterate hydrogels and gelatin.

[0115] Sterile injectable solutions are prepared by incorporating thecompounds in the required amount in the appropriate solvent with variousother ingredients as enumerated above and, as required, followed byfilter sterilization. In the case of sterile powders for the preparationof sterile injectable solutions, the preferred methods of preparationare vacuum drying and freeze-drying techniques, which yield a powder ofthe active ingredient plus any additional desired ingredient present inthe previously sterile-filtered solutions.

[0116] Industrial Applications

[0117] The AviIII polypeptides of the invention are effectivecellulases. In the methods of the invention, the cellulose degradingeffects of AviIII are achieved by treating biomass at a ratio of about 1to about 50, or about 1:40, 1:35, 1:30, 1:25 , 1:20 or even about 1:70in some preparations of the AVIIII of AviIII:biomass. AviIII may be usedunder extreme conditions, for example, elevated temperatures and acidicpH. Treated biomass is degraded into simpler forms of carbohydrates, andin some cases glucose, which is then used in the formation of ethanol orother industrial chemicals, as is known in the art. Other methods areenvisioned to be within the scope of the present invention, includingmethods for treating fabrics to remove cellulose-containing stains andother methods already discussed. AviIII polypeptides can be used in anyknown application currently utilizing a cellulase, all of which arewithin the scope of the present invention.

[0118] Having generally described the invention, the same will be morereadily understood by reference to the following examples, which areprovided by way of illustration and are not intended as limiting.

EXAMPLES Example 1

[0119] Molecular Cloning of AviIII

[0120] Genomic DNA was isolated from Acidothermus cellulolyticus andpurified by banding on cesium chloride gradients. Genomic DNA waspartially digested with Sau 3A and separated on agarose gels. DNAfragments in the range of 9-20 kilobase pairs were isolated from thegels. This purified Sau 3A digested genomic DNA was ligated into the BamH1 acceptor site of purified EMBL3 lambda phage arms (Clontech, SanDiego, Calif.). Phage DNA was packaged according to the manufacturer'sspecification and plated with E. Coli LE392 in top agar which containedthe soluble cellulose analog, carboxymethylcellulose (CMC). The plateswere incubated overnight (12-24 hours) to allow transfection, bacterialgrowth, and plaque formation. Plates were stained with Congo Redfollowed by destaining with 1 M NaCl. Lambda plaques harboringendoglucanase clones showed up as unstained plaques on a red background.

[0121] Lambda clones which screened positive on CMC-Congo Red plateswere purified by successive rounds of picking, plating and screening.Individual phage isolates were named SL-1, SL-2, SL-3, and SL-4.Subsequent subcloning efforts employed the SL-3 clone which contained anapproximately 14.2 kilobase fragment of Acidothermus cellulolyticusgenomic DNA.

[0122] Template DNA was constructed using a 9 kilobase Bam H1 fragmentobtained from the 14.2 kilobase lambda clone SL-3 prepared fromAcidothermus cellulolyticus genomic DNA. The 9 kilobase Barn Hi fragmentfrom SL-3 was subcloned into pDR540 to generate a plasmid NREL501.NREL501 was sequenced by the primer walking method as is known in theart. NREL501 was then subcloned into pUC19 using restriction enzymes PstI and Eco RI and transformed into E. coli XL1 -blue (Stratagene) for theproduction of template DNA for sequencing. Each subclone was sequencedfrom both the forward and reverse directions. DNA for sequencing wasprepared from an overnight growth in 500 mL LB broth using a megaprepDNA purification kit from Promega. The templated DNA was PEGprecipitated and suspended in de-ionized water and adjusted to a finalconcentration of 0.25 milligrams/mL.

[0123] Custom primers were designed by reading upstream known sequenceand selecting segments of an appropriate length to function, as is wellknown in the art. Primers for cycle sequencing were synthesized at theMacromolecular Resources Facility located at Colorado State Universityin Fort Collins, Colo. Typically the sequencing primers were 26 to 30nucleotides in length, but were sometimes longer or shorter toaccommodate a melting temperature appropriate for cycle sequencing. Thesequencing primers were diluted in de-ionized water, the concentrationmeasured using UV absorbance at 260 nm, and then adjusted to a finalconcentration of 5 pmol/microL.

[0124] Templates and sequencing primers were shipped to the Iowa StateUniversity DNA Sequencing Facility at Ames, Iowa for sequencing usingstandard chemistries for cycle sequencing. In some cases, regions of thetemplate that sequenced poorly using the standard protocols and dyeterminators were repeated with the addition of 2 microL DMSO and byusing nucleotides optimized for the sequencing of high GC content DNA.An inverse PCR technique known in the art was applied to continuesequencing the genomic DNA, and a primer walking method was used tosequence the large PCR products. Each PCR fragment was sequenced fromboth strands, using high fidelity commercial DNA polymerase.

[0125] Sequencing data from primer walking and subclones were assembledtogether to verify that all SL-3 regions had been sequenced from bothstrands. An open reading frame (ORF) was found in the 9 kilobase Bam H1fragment, C-terminal of E1 (U.S. Pat. No. 5,536,655), termed AviIII. AnORF of 3366 bp [SEQ ID NO:2] and deduced amino acid sequence [SEQ IDNO:1] are shown in Tables 1 and 2. The amino acid sequence predicted bySEQ ID NO: 1 was determined to have significant homology to knowncellulases, as is shown below in Example 2 and Table 3.

[0126] The amino acid sequence represents a novel member of the familyof proteins with cellulase activity. Due to the source of isolation,from the thermophilic Acidothermus cellulolyticus, AviIII is a novelmember of cellulases with properties including thermal tolerance. It isalso known that thermal tolerant enzymes may have other properties (seedefinition above).

Example 2

[0127] AviIII includes a GH74 catalytic domain

[0128] Sequence alignments and comparisons of the amino acid sequencesof the Acidothermus cellulolyticus AviIII catalytic domain(approximately amino acids 37 to 776) and Aspergillus aculeatusAvicelase III (endoglucanase) polypeptides were prepared, using theClustalW program (Thompson J. D et al. (1994), Nucleic Acids Res.22:4673-4680 from EMBL European Bioinformatics Institute website(http://www.ebi.ac.uk/)). An examination of the amino acid sequencealignment of the GH74 domain indicates that the amino acid sequence ofAviIII catalytic domain is homologous to the amino acid sequence of aknown GH74 family catalytic domains for Aspergillus aculeatus AvicelaseIII (endoglucanase) (see Table 3). In Table 3, the notations are asfollows: an asterisk “*” indicates identical or conserved residues inall sequences in the alignment; a colon “:” indicates conservedsubstitutions; a period “.” indicates semi-conserved substitutions; anda hyphen “-” indicates a gap in the sequence. The amino acid sequencepredicted for the AviIII GH74 domain is approximately 46% identical tothe Aspergillus aculeatus Avicelase III (endoglucanase) GH74 domain,indicating that the AviIII catalytic domain is a member of the GH74family (Henrissat et al., (1991) supra). TABLE 3 Multiple amino acidsequence alignment of a AviIII catalytic domain and polypeptides withGlycoside Hydrolase Family 74 catalytic domains. Multi alignment ofrelated Glycoside Hydrolase Family 74 catalytic domain GH74_Ace:Acidothermus cellulolyticus AviIII catalytic domain GH74 AviIII_Aac:Aspergillus aculeatus Avicelase m (endoglucanase). GeneBank Acc.# BAA29031 GH74_AceATTQPYTWSNVAIGGGG-FVDGIVFNEGAPGILYVRTDIGGMYRWDAANGRWIPLLDWVG AviIII_AacAASQAYTWKNVVTGGGGGFTPGIVFNPSAKGVAYARTDIGGAYRLNSDD-TWTPLMDWVG*::*.***.**. **** *. ***** .* *: *.****** ** :: :  * **:**** GH74_AceWNNWGYNGVVSIAADPINTNKVWAAVGMYTNSWDPNDGAILRSSDQCATWQITPLPFKLG AviIII_AacNDTWHDWGIDALATDPVDTDRVYVAVGMYTNEWDPNVGSILRSTDQGDTWTETKLPFKVG :.*   *:::*:**::*::*:.*******.**** *:****:*** **  * ****:* GH74_AceGNMPGRGMGERLAVDPNNDNILYFGAPSGKGLWRSTDSGATWSQMTNFPDVGTYIANPTD AviIII_AacGNMPGRGMGERLAVDPNKNSILYFGARSGHGLWKSTDYGATWSNVTSFTWTGTYFQDSSS*****************::.****** **:***:*** *****::*.*. .***: :.:. GH74_AceTTGYQSDIQGVVWVAFDKSSSSLGQASKTIFVGVADPNNPVFWSRDGGATWQAVPGAR-TT--YTSDPVGIAWVTFDSTSGSSGSATPRIFVGVADAGKSVFKSEDAGATWANVSGEPQY*  * **  *:.**:**.:*.* *.*:  *******..:.** *.*.****  *.* * GH74_AceGFIPHKGVFDPVNHVLYIATSNTGGPYDGSSGDVWKFSVTSGTWTRISPVPSTDTANDYF AviIII_AacGFLPHKGVLSPEEKTLYISYANGAGPYDGTNGTVHKYNITSGVWTDISP---TSLASTYY **:*****:.*::.***: :* .*****:.* * *:.:***.** ***   *. *. *: GH74_AceGYSGLTIDRQHPNTIMVATQISWWPDTIIFRSTDGGATWTRIWDWTSYPNRSLRYVLDIS AviIII_AacGYGGLSVDLQVPGTLMVAALNCWWPDELIFRSTDSGATWSPIWEWNGYPSINYYYSYDIS **.**::* **.*:***:  .**** :******.****: **:*..**. .  *  *** GH74_AceAEPWLTFGVQPNPPVPSPKLGWMDEAMAIDPFNSDRMLYGTGATLYATNDLTKWDSGGQI AviIII_AacNAPWIQDTTSTDQFP--VRVGMMVEALAIDPFDSNHWLYGTGLTVYGGHDLTNWDSKHNV**:    ...:      ::*** **:*****:*:: ***** *:*. :***:***  :: GH74_AceHIAPMVKGLEETAVNDLISPPSGAPLISALGDLGGFTHADVTAVPSTIFTSPVFTTGTSV AviIII_AacTVKSLAVGIEEMAVLGLITPPGGPALLSAVGDDGGFYHSDLDAAPNQAYHTPTYGTTNGI : .:. *.**** .**:**.*..*:**.** *** *:*: *.*. : :*.: *..: GH74_AceDYAELNPSIIVRAGSFDPSSQPNDRHVAFSTDGGKNWFQGSEPGGVTTGGTVAASADGSRDYAGNKPSNIVRSGASDDYP-----TLALSSNFGSTWYADYAASTSTGTGAVALSADGDT *** :*****:*: *  .      :*:*:: *..*: . .. *..*:** ****. GH74_AceFVWAPGDPGQPVVYAVGFGNSWAASQGVPANAQIRSDRVNPKTFYALSNGTFYRSTDGGVAviIII_(‘)AacVLLMSSTSGALVSKSQG---TLTAVSSLPSGAVIASDKSDNTVFYGGSAGAIYVSKNTAT .: .. .* *: *   : :* ..:*:.* * **: : ..**. * *::* *.: .. GH74_AceTFQPVAAGLPSSGAVGVMFHAVPGKEGDLWLAASSGLYHSTNGGSSWSAI-TGVSSAVNV AviIII_AacSFTKTVS-LGSSTTVNAIR-AHPSIAGDVWASTDKGLWHSTDYGSTFTQIGSGVTAGWST :*  ..: *** :*..: * *. **:* ::..**:***: **::: * :**::. .. GH74_AceGFGKSAPGSSYPAVFVVGTIGGVTGAYRSDDCGTTWVLINDDQHQYGN-WGQAITGDHAN AviIII_AacGFGKASSTGSYVVIYGFFTIDGAAGLFKSEDAGTNWQVISDASHGFGSSGANVVNGDLQT ****::. .**.:: . **.*.:* ::*:*.**.* :*.* .* :*.  .:.:.** . GH74_AceLRRVYIGTNGRGIVYGDIGGAPSG AviIII_Aac YGRVFRGHERPGHLLRQSQREPAG   **: *:  * :  *    *:*

Example 3

[0129] Mixed Domain GH74, CBD II, CBD III Genes and Hybrid Polypeptides

[0130] From the putative locations of the domains in the AviIIIcellulase sequence given above and in comparable cloned cellulasesequences from other species, one can separate individual domains andcombine them with one or more domains from different sequences. Thesignificant similarity between cellulase genes permit one by recombinanttechniques to arrange one or more domains from the Acidothermuscellulolyticus AviIII cellulase gene with one or more domains from acellulase gene from one or more other microorganisms. Otherrepresentative endoglucanase genes include Bacillus polymyxa beta-(1,4)endoglucanase (Baird et al, Journal of Bacteriology, 172: 1576-86(1992)) and Xanthomonas campestris beta-(1,4)-endoglucanase A (Gough etal, Gene 89:53-59 (1990)). The result of the fusion of any two or moredomains will, upon expression, be a hybrid polypeptide. Such hybridpolypeptides can have one or more catalytic or binding domains. For easeof manipulation, recombinant techniques may be employed such as theaddition of restriction enzyme sites by site-specific mutagenesis. Ifone is not using one domain of a particular gene, any number of any typeof change including complete deletion may be made in the unused domainfor convenience of manipulation.

[0131] It is understood for purposes of this disclosure, that variouschanges and modifications may be made to the invention that are wellwithin the scope of the invention. Numerous other changes may be madewhich will readily suggest themselves to those skilled in the art andwhich are encompassed in the spirit of the invention disclosed hereinand as defined in the appended claims.

[0132] This specification contains numerous citations to references suchas patents, patent applications, and publications. Each is herebyincorporated by reference for all purposes.

What is claimed is:
 1. A composition comprising a substantially purifiedthermostable AviIII peptide, said AviIII peptide comprising a catalyticdomain GH74 and a carbohydrate binding domain (CBD) III.
 2. Thecomposition of claim 1 wherein the thermostable AviIII peptide isfurther defined as comprising a linker and a signal sequence.
 3. Thecomposition of claim 1 or 2 wherein the GH74 catalytic domain of thethermostable AviIII peptide is further defined as having a length ofabout 730 to about 760 amino acids.
 4. The composition of claim 1, 2, or3 wherein the carbohydrate binding domain (CBD) III of the thermostableAviIII peptide is further defined as comprising a length of about 80 toabout 150 amino acids.
 5. The composition of claim 1, 2, 3, or 4 whereinthe carbohydrate binding domain (CBD) III of the thermostable AviIIIpeptide is further defined as comprising a length of about 90 aminoacids.
 6. The composition of claim 3 wherein the GH74 catalytic domainis further defined as the sequence of SEQ ID NO:
 3. 7. The compositionof claim 4 wherein the carbohydrate binding domain (CBD) HI is furtherdefined as the sequence of SEQ ID NO: 4
 8. The composition of claim 4wherein the carbohydrate-binding domain (CBD) III is further defined ascomprising the sequence of SEQ ID NO:
 5. 9. The composition of claim 1further defined as comprising a sequence of SEQ ID NO: 3 and SEQ ID NO:4.
 10. The composition of claim 1 further defined as comprising anucleic acid sequence having about 70% sequence identity to the sequenceof SEQ ID NO:2.
 11. The composition of claim 1 further defined ascomprising a nucleic acid sequence having about 80% sequence identity tothe sequence of SEQ ID NO:2.
 12. A thermostable AviIII peptide having asequence of SEQ ID NO:
 1. 13. The thermostable AviIII peptide of claim12 further defined as having a sequence of SEQ ID NO:
 2. 14. Anindustrial mixture suitable for degrading cellulose, such mixturecomprising the thermostable AviIII polypeptide of claim
 1. 15. Theindustrial mixture of claim 14 further defined as comprising adetergent.
 16. An isolated polynucleotide molecule encoding athermostable AviIII polypeptide, said AviIII polypeptide comprising: a)a sequence of SEQ ID NO: 1; b) a sequence of SEQ ID NO: 3; c) a sequenceof SEQ ID NO: 4; d) a sequence of SEQ I) NO: 5; e) a sequence havingabout 70% sequence identity with the sequence of a), b), c) or d). 17.The isolated polynucleotide molecule of claim 16 comprising a nucleicacid sequence having about 90% sequence identity to the sequence of SEQID NO:
 2. 18. The isolated polynucleotide molecule of claim 16comprising a nucleic acid sequence having about 80% sequence identity tothe sequence of SEQ ID NO:
 2. 19. The isolate polynucleotide molecule ofclaim 16, comprising a nucleic acid sequence having about 90% sequenceidentity to the nucleic acid sequence encoding the sequence of SEQ IDNO:3.
 20. The isolated polynucleotide molecule of claim 16, comprising anucleic acid sequence having about 90% sequence identity to the nucleicacid sequence encoding the sequence of SEQ ID NO:5.
 21. The isolatedpolynucleotide molecule of claim 16, comprising a nucleic acid sequencehaving about 90% sequence identity to the nucleic acid sequence encodingthe sequence of SEQ ID NO:
 1. 22. The isolated polynucleotide moleculeof claim 16, further comprising a nucleic acid sequence encoding aheterologous protein in frame with the polynucleotide molecule ofclaim
 1. 23. The isolated polynucleotide molecule of claim 22, whereinthe heterologous protein is a peptide tag.
 24. The isolatedpolynucleotide molecule of claim 22, wherein the peptide tag is 6-His,thioredoxin, hemaglutinin, GST, or OmpA signal sequence tag.
 25. Theisolated polynucleotide molecule of claim 22, wherein the heterologousprotein is a substrate targeting moiety.
 26. The isolated polynucleotidemolecule of claim 16, operably linked to a transcriptional ortranslational regulatory sequence.
 27. The isolated polynucleotidemolecule of claim 26, wherein the transcriptional or translationalregulatory sequence comprises a transcriptional promoter or enhancer.28. An isolated polypeptide molecule comprising: a) a sequence of SEQ IDNO: 3; b) a sequence of SEQ ID NO: 4; c) a sequence of SEQ ID NO: 5; d)a sequence of SEQ ID NO: 1; or e) a sequence of SEQ ID NO: 3, SEQ IDNO:4, and SEQ ID NO: 5; or f) a sequence having about 70% sequenceidentity with the sequence of a), b), c), d), or e).
 29. The polypeptidemolecule of claim 28, having about 90% sequence identity with thesequence of a), b), c), d), e) or f).
 30. A fusion protein comprisingthe polypeptide of claim 28 and a heterologous peptide.
 31. The fusionprotein of claim 30, wherein the heterologous peptide is a substratetargeting moiety.
 32. The fusion protein of claim 30, wherein theheterologous peptide is a peptide tag.
 33. The fusion protein of claim32, wherein the peptide tag is 6-His, thioredoxin, hemaglutinin, GST, orOmpA signal sequence tag.
 34. The fusion protein of claim 30, whereinthe heterologous peptide is an agent that promotes polypeptideoligomerization.
 35. The fusion protein of claim 34, wherein the agentis a leucine zipper.
 36. A cellulase-substrate complex comprising theisolated polypeptide molecule of claim 28 bound to cellulose.
 37. Avector comprising the polypeptide molecule of claim
 28. 38. A host cellgenetically engineered to express the polypeptide moleculeof claim 28.39. The host cell of claim 38, wherein the host cell is a plant cell.40. The host cell of claim 38, wherein the host cell is a fungi.
 41. Thehost cell of claim 38, wherein the host cell is a bacterial cell. 42.The host cell of claim 38, wherein the host cell is a yeast.
 43. Acomposition comprising the polypeptide molecule of claim 28 and acarrier.
 44. An isolated antibody that specifically binds to thepolypeptide molecule of claim
 28. 45. The antibody of claim 44, whereinthe antibody is a polyclonal antibody.
 46. The antibody of claim 44,wherein the antibody is a monoclonal antibody.
 47. A method forproducing AviIII polypeptide, the method comprising: incubating a hostcell genetically engineered to express the polynucleotide molecule ofclaim
 28. 48. The method of claim 47, further comprising the step of:isolating the AviIII polypeptide from the incubated host cells.
 49. Themethod of claim 47, wherein the host cell is a plant cell.
 50. Themethod of claim 47, wherein the host cell is a bacterial cell.
 51. Themethod of claim 47, wherein the host cell is genetically engineered toexpress a selectable marker.
 52. The method of claim 47, wherein thehost cell further comprises a polynucleotide molecule encoding one ormore polypeptide molecules selected from the glycoside hydrolase familyof proteins.
 53. The method of claim 52, wherein the glycoside hydrolaseis a thermostable glycoside
 54. A set of amplification primers foramplification of a polynucleotide molecule encoding a thermostableAviIII, comprising: two or more sequences comprising 9 or morecontiguous nucleic acids derived from the polynucleotide molecule ofclaim
 28. 55. A probe for hybridizing to a polynucleotide encodingAviIII, comprising: a sequence of 9 or more contiguous nucleic acidsderived from the polynucleotide molecule of claim
 28. 56. An assaymethod for the detection of a polynucleotide encoding a thermostableAviIII, comprising: amplifying a nucleic acid sequence with a set ofamplification primers comprising two or more sequences of 9 or morecontiguous nucleic acids derived from the polynucleotide molecule ofclaim 28; and correlating the amplified nucleic acid sequence withdetected polynucleotide encoding a thermostable AviIII.
 57. A method forassessing the carbohydrate degradation activity of AviIII comprising:analyzing a carbohydrate degradation in the presence of AviIII and acarbohydrate degradation in the absence of AviIII on a substrate; andcomparing the carbohydrate degradation in the presence of AviIII withthe carbohydrate degradation in the absence of AviIII.
 58. A method forassessing the carbohydrate degradation activity of AviIII in thepresence of an agent of interest comprising: analyzing a carbohydratedegradation in the presence of AviIII and a carbohydrate degradation inthe presence of AviIII and the agent of interest on a substrate exposed;and comparing the carbohydrate degradation in the AviIII treatedsubstrate with the carbohydrate degradation in the AviIII treatedsubstrate in the presence of the agent of interest.
 59. The method ofclaim 58, wherein an increase in carbohydrate degradation activity inthe presence of the agent of interest demonstrates stimulation of AviIIIactivity and wherein a decrease in carbohydrate degradation activitydemonstrates inhibition of AviIII activity.
 60. The method of claim 58,wherein the carbohydrate is cellulose.
 61. The method of claim 58wherein the agent of interest is an antibody.
 62. A method for reducingcellulose in a starting material, the method comprising: administeringto the starting material an effective amount of a polypeptide moleculeof claim
 28. 63. The method of claim 62, further comprisingadministering a second polypeptide molecule selected from the glycosidehydrolase family of proteins.
 64. The method of claim 62, wherein thestarting material is agricultural biomass.
 65. The method of claim 62,wherein the starting material is municipal solid waste.